<tt>pylspack</tt> : Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage Scores

نویسندگان

چکیده

We present parallel algorithms and data structures for three fundamental operations in Numerical Linear Algebra: (i) Gaussian CountSketch random projections their combination, (ii) computation of the Gram matrix (iii) squared row norms product two matrices, with a special focus on "tall-and-skinny" which arise many applications. provide detailed analysis ubiquitous transform its combination projections, accounting memory requirements, computational complexity workload balancing. also demonstrate how these results can be applied to column subset selection, least squares regression leverage scores computation. These tools have been implemented pylspack, publicly available Python package (https://github.com/IBM/pylspack) whose core is written C++ parallelized OpenMP, compatible standard SciPy NumPy. Extensive numerical experiments indicate that proposed scale well significantly outperform existing libraries tall-and-skinny matrices.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy algorithm has been shown to be quite effective in practice. However, theoretical guarantees on its performance have not been explored thoroughly, especially i...

متن کامل

Column Subset Selection with Missing Data

An important problem in massive data collection systems is to identify a representative subset of collection points from a given set of measurements. This problem is well cast in the literature as selecting representative columns from a low-rank data matrix. However, a challenge that arises immediately is that complete data are typically not available or are infeasible to collect in very large ...

متن کامل

Fast Quantum Algorithms for Least Squares Regression and Statistic Leverage Scores

Quantum algorithms for solving linear systems, and the controversy The past two decades witnessed the development of quantum algorithms [Mos09], and one recent discovery is quantum speedup for solving linear systems Ax = b for sparse and well-conditioned matrices A ∈ R. Solving linear systems is a ubiquitous computational task, and sparse and wellconditioned matrices form a fairly large class o...

متن کامل

Provably Correct Active Sampling Algorithms for Matrix Column Subset Selection with Missing Data

We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to numerous real-world data applications such as population genetics summarization, electronic circuits testing and recommendation systems. In many applications...

متن کامل

Provably Correct Algorithms for Matrix Column Subset Selection with Selectively Sampled Data

We consider the problem of matrix column subset selection, which selects a subset of columns from an input matrix such that the input can be well approximated by the span of the selected columns. Column subset selection has been applied to numerous real-world data applications such as population genetics summarization, electronic circuits testing and recommendation systems. In many applications...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Mathematical Software

سال: 2022

ISSN: ['0098-3500', '1557-7295']

DOI: https://doi.org/10.1145/3555370